由于Covid-19大流行,对远程学习/工作和远程医疗对电信的需求显着增加。 6G网络中的移动边缘缓存(MEC)已被发展为一种有效的解决方案,以满足全球移动数据流量的现象增长,使多媒体内容更接近用户。虽然MEC网络使能的大规模连接将显着提高通信质量,但未来有几个关键挑战。边缘节点的有限存储,大尺寸的多媒体内容以及时变用户的偏好使得能够有效地和动态地预测内容的普及,以存储在被请求之前存储最多即将到来的请求的。深度神经网络(DNN)的最新进展绘制了很多研究,以预测主动缓存方案中的内容普及。然而,在此上下文中存在的现有DNN模型遭受Longterm依赖关系,计算复杂性和不适合并行计算的不适合性。为了解决这些挑战,我们提出了一个边缘缓存框架,其与关注的视觉变压器(VIV)神经网络引入,称为基于变压器的边缘(TEGED)缓存,这是我们所知的最佳知识,正在研究第一次。此外,TEGECACH CACHING框架不需要数据预处理和附加的上下文信息。仿真结果与其对应物相比,证实了提出的TEGECACHING框架的有效性。
translated by 谷歌翻译
肺癌是全世界癌症死亡的主要原因,具有各种组织学类型,其中肺腺癌(Luac)最近是最普遍的。肺腺癌被归类为预侵入性,微创和侵入性腺癌。及时,准确地了解肺结核的侵袭性导致适当的治疗计划,并降低了不必要或晚期手术的风险。目前,主要成像模型评估和预测Luacs的侵袭性是胸部CT。然而,基于CT图像的结果是主观的并且与手术切除后提供的地面真理审查相比,患有低精度。本文开发了一种基于预测变压器的框架,称为“CAE变压器”,以对Luacs进行分类。 CAE变换器利用卷积自动编码器(CAE)来自动从CT切片中提取信息性功能,然后将其馈送到修改的变压器模型以捕获全局切片关系。我们的内部数据集114个病理证明的副实体结节(SSN)的实验结果证明了CAE变压器在直方图/基于射频的模型上的优越性及其基于深度学习的对应物,实现了87.73%,灵敏度的准确性使用10倍交叉验证,88.67%,特异性为86.33%和0.913的AUC。
translated by 谷歌翻译
这项研究的目的是开发一个强大的基于深度学习的框架,以区分Covid-19,社区获得的肺炎(CAP)和基于使用各种方案和放射剂量在不同成像中心获得的胸部CT扫描的正常病例和正常情况。我们表明,虽然我们的建议模型是在使用特定扫描协议仅从一个成像中心获取的相对较小的数据集上训练的,但该模型在使用不同技术参数的多个扫描仪获得的异质测试集上表现良好。我们还表明,可以通过无监督的方法来更新模型,以应对火车和测试集之间的数据移动,并在从其他中心接收新的外部数据集时增强模型的鲁棒性。我们采用了合奏体系结构来汇总该模型的多个版本的预测。为了初始培训和开发目的,使用了171 Covid-19、60 CAP和76个正常情况的内部数据集,其中包含使用恒定的标准辐射剂量扫描方案从一个成像中心获得的体积CT扫描。为了评估模型,我们回顾了四个不同的测试集,以研究数据特征对模型性能的转移的影响。在测试用例中,有与火车组相似的CT扫描,以及嘈杂的低剂量和超低剂量CT扫描。此外,从患有心血管疾病或手术病史的患者中获得了一些测试CT扫描。这项研究中使用的整个测试数据集包含51 covid-19、28 CAP和51例正常情况。实验结果表明,我们提出的框架在所有测试集上的表现良好,达到96.15%的总准确度(95%CI:[91.25-98.74]),COVID-119,COVID-96.08%(95%CI:[86.54-99.5],95%),[86.54-99.5],),,),敏感性。帽敏感性为92.86%(95%CI:[76.50-99.19])。
translated by 谷歌翻译
逆转录 - 聚合酶链反应(RT-PCR)目前是Covid-19诊断中的金标准。然而,它可以花几天来提供诊断,假负率相对较高。成像,特别是胸部计算断层扫描(CT),可以有助于诊断和评估这种疾病。然而,表明标准剂量CT扫描对患者提供了显着的辐射负担,尤其是需要多次扫描的患者。在这项研究中,我们考虑低剂量和超低剂量(LDCT和ULDCT)扫描方案,其减少靠近单个X射线的辐射曝光,同时保持可接受的分辨率以进行诊断目的。由于胸部放射学专业知识可能不会在大流行期间广泛使用,我们使用LDCT / ULDCT扫描的收集的数据集进行人工智能(AI)基础的框架,以研究AI模型可以提供人为级性能的假设。 AI模型使用了两个阶段胶囊网络架构,可以快速对Covid-19,社区获得的肺炎(帽)和正常情况进行分类,使用LDCT / ULDCT扫描。 AI模型实现Covid-19敏感性为89.5%+ - 0.11,帽敏感性为95%+ \ - 0.11,正常情况敏感性(特异性)85.7%+ - 0.16,精度为90%+ \ - 0.06。通过纳入临床数据(人口统计和症状),性能进一步改善了Covid-19敏感性为94.3%+ \ - PM 0.05,帽敏感性为96.7%+ \ - 0.07,正常情况敏感性(特异性)91%+ - 0.09,精度为94.1%+ \ - 0.03。所提出的AI模型基于降低辐射暴露的LDCT / ULDCT扫描来实现人级诊断。我们认为,所提出的AI模型有可能协助放射科医师准确,并迅速诊断Covid-19感染,并帮助控制大流行期间的传输链。
translated by 谷歌翻译
Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
translated by 谷歌翻译
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modelling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no labelled benchmark for this task. We address this gap by introducing continuous valence and arousal annotations for an existing dataset of children's stories annotated with discrete emotion categories. We collect additional annotations for this data and map the originally categorical labels to the valence and arousal space. Leveraging recent advances in Natural Language Processing, we propose a set of novel Transformer-based methods for predicting valence and arousal signals over the course of written stories. We explore several strategies for fine-tuning a pretrained ELECTRA model and study the benefits of considering a sentence's context when inferring its emotionality. Moreover, we experiment with additional LSTM and Transformer layers. The best configuration achieves a Concordance Correlation Coefficient (CCC) of .7338 for valence and .6302 for arousal on the test set, demonstrating the suitability of our proposed approach. Our code and additional annotations are made available at https://github.com/lc0197/emotion_modelling_stories.
translated by 谷歌翻译
Automatic video captioning aims for a holistic visual scene understanding. It requires a mechanism for capturing temporal context in video frames and the ability to comprehend the actions and associations of objects in a given timeframe. Such a system should additionally learn to abstract video sequences into sensible representations as well as to generate natural written language. While the majority of captioning models focus solely on the visual inputs, little attention has been paid to the audiovisual modality. To tackle this issue, we propose a novel two-fold approach. First, we implement a reward-guided KL Divergence to train a video captioning model which is resilient towards token permutations. Second, we utilise a Bi-Modal Hierarchical Reinforcement Learning (BMHRL) Transformer architecture to capture long-term temporal dependencies of the input data as a foundation for our hierarchical captioning module. Using our BMHRL, we show the suitability of the HRL agent in the generation of content-complete and grammatically sound sentences by achieving $4.91$, $2.23$, and $10.80$ in BLEU3, BLEU4, and METEOR scores, respectively on the ActivityNet Captions dataset. Finally, we make our BMHRL framework and trained models publicly available for users and developers at https://github.com/d-rothen/bmhrl.
translated by 谷歌翻译
One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator network is a Time Delay Neural Network (TDNN) architecture while the three discriminators are feed-forward networks. The system was applied to the OGI speech corpora and achieved a 13% reduction in the WER of the ASR.
translated by 谷歌翻译
幽默是人类情感和认知的重要因素。它的自动理解可以促进更自然的人类设备互动和人工智能的人性化。当前的幽默检测方法仅基于分阶段数据,使其不适用于“现实世界”应用程序。我们通过引入新颖的Passau自发足球教练幽默(Passau-SFCH)数据集来解决这种缺陷,包括大约11个小时的录音。在马丁的幽默风格问卷中提出的幽默及其尺寸(情感和方向)的存在,请注释Passau-SFCH数据集。我们进行了一系列实验,采用了经过预定的变压器,卷积神经网络和专家设计的功能。分析了每种模式(文本,音频,视频)的表现,以进行自发幽默识别,并研究了它们的互补性。我们的发现表明,对于对幽默及其情感的自动分析,面部表情是最有希望的,而幽默方向可以通过基于文本的功能进行建模。结果揭示了各种主题之间的差异,突出了幽默用法和风格的个性。此外,我们观察到决策级融合会产生最佳认可结果。最后,我们在https://www.github.com/eihw/passau-sfch上公开代码。可以根据要求获得Passau-SFCH数据集。
translated by 谷歌翻译
我们考虑开放的联合学习(FL)系统,客户可以在FL过程中加入和/或离开系统。鉴于当前客户端数量的差异,在开放系统中不能保证与固定模型的收敛性。取而代之的是,我们求助于一个新的性能指标,该指标称我们的开放式FL系统的稳定性为量,该指标量化了开放系统中学习模型的幅度。在假设本地客户端的功能强烈凸出和平滑的假设下,我们从理论上量化了两种FL算法的稳定性半径,即本地SGD和本地ADAM。我们观察到此半径依赖于几个关键参数,包括功能条件号以及随机梯度的方差。通过对合成和现实世界基准数据集的数值模拟,我们的理论结果得到了进一步验证。
translated by 谷歌翻译